24 research outputs found

    Predicting wikipedia infobox type information using word embeddings on categories

    Get PDF
    Wikipedia has emerged as the largest multilingual, web based general reference work on the Internet. A huge amount of human resources have been invested in the creation and update of Wikipedia articles which are ideally complemented by so-called infobox templates defining the type of the underlying article. It has been observed that the Wikipedia infobox type information is often incomplete and inconsistent due to various reasons. However, the Wikipedia infobox type information plays a fundamental role for the RDF type information of Wikipedia based Knowledge Graphs such as DBpedia. This stimulates the need of always having the correct and complete infobox type information. In this work, we propose an approach to predict Wikipedia infobox types by using word embeddings on categories of Wikipedia articles, and analyze the impact of using minimal information from the Wikipedia articles in the prediction process

    DORIS: Discovering Ontological Relations In Services

    Get PDF
    We propose to demonstrate DORIS, a system that maps the schema of a Web Service automatically to the schema of a knowledge base. Given only the input type and the URL of the Web Service, DORIS executes a few probing calls, and deduces an intensional description of the Web service. In addition, she computes an XSLT transformation function that can transform a Web Service call result in XML to RDF facts in the target schema. Users will be able to play with DORIS, and to see how real-world Web Services can be mapped to large knowledge bases of the Semantic Web

    Temporal Role Annotation for Named Entities

    Get PDF
    Natural language understanding tasks are key to extracting structured and semantic information from text. One of the most challenging problems in natural language is ambiguity and resolving such ambiguity based on context including temporal information. This paper, focuses on the task of extracting temporal roles from text, e.g. CEO of an organization or head of a state. A temporal role has a domain, which may resolve to different entities depending on the context and especially on temporal information, e.g. CEO of Microsoft in 2000. We focus on the temporal role extraction, as a precursor for temporal role disambiguation. We propose a structured prediction approach based on Conditional Random Fields (CRF) to annotate temporal roles in text and rely on a rich feature set, which extracts syntactic and semantic information from text. We perform an extensive evaluation of our approach based on two datasets. In the first dataset, we extract nearly 400k instances from Wikipedia through distant supervision, whereas in the second dataset, a manually curated ground-truth consisting of 200 instances is extracted from a sample of The New York Times (NYT) articles. Last, the proposed approach is compared against baselines where significant improvements are shown for both datasets

    "The Less Is More" for Text Classification

    Get PDF

    Leveraging Mathematical Subject Information to Enhance Bibliometric Data

    Get PDF
    The field of mathematics is known to be especially challenging from a bibliometric point of view. Its bibliographic metrics are especially sensitive to distortions and are heavily influenced by the subject and its popularity. Therefore, quantitative methods are prone to misrepresentations, and need to take subject information into account. In this paper we investigate how the mathematical bibliography of the abstracting and reviewing service Zentralblatt MATH (zbMATH) could further benefit from the inclusion of mathematical subject information MSC2010. Furthermore, the mappings of MSC2010 to Linked Open Data resources have been upgraded and extended to also benefit from semantic information provided by DBpedia

    SOFYA: Semantic on-the-fly Relation Alignment

    Get PDF
    Recent years have seen the rise of Web data, in particular Linked Data, with, up to now, more than 1000 datasets in the Linked Open Data Cloud (LOD). These datasets are mostly of entity-centric nature and are highly heterogeneous in terms of domains, language, schema, etc. Hence, the vision of uniformly querying such resources in the LOD has a long way to go. While equivalent entity instances across datasets are often linked by sameAs links, relations from different datasets and schemas are usually not aligned. In this paper, we propose an on-line instance-based relation alignment approach. The alignment may be performed during query execution and requires partial information from the datasets. We align relations to a target dataset using association rule mining approaches. We sample for equivalent entity instances with two main sampling strategies. Preliminary experiments, show that we are able to align relations with high accuracy, even if accessing the entire datasets is impossible or impractical

    TableNet: An approach for determining fine-grained relations for wikipedia tables

    Get PDF
    We focus on the problem of interlinking Wikipedia tables with fine-grained table relations: equivalent and subPartOf. Such relations allow us to harness semantically related information by accessing related tables or facts therein. Determining the type of a relation is not trivial. Relations are dependent on the schemas, the cell-values, and the semantic overlap of the cell values in tables. We propose TableNet, an approach for interlinking tables with subPartOf and equivalent relations. TableNet consists of two main steps: (i) for any source table we provide an efficient algorithm to find candidate related tables with high coverage, and (ii) a neural based approach that based on the table schemas and data, determines with high accuracy the fine-grained relation. Based on an extensive evaluation with more than 3.2M tables, we show that TableNet retains more than 88% of relevant tables pairs, and assigns table relations with an accuracy of 90%

    TECNE: Knowledge based text classification using network embeddings

    Get PDF
    Text classification is an important and challenging task due to its application in various domains such as document organization and news filtering. Several supervised learning approaches have been proposed for text classification. However, most of them require a significant amount of training data. Manually labeling such data can be very time-consuming and costly. To overcome the problem of labeled data, we demonstrate TECNE, a knowledge-based text classification method using network embeddings. The proposed system does not require any labeled training data to classify an arbitrary text. Instead, it relies on the semantic similarity between entities appearing in a given text and a set of predefined categories to determine a category which the given document belongs to

    Machine Learning gegen Schwerhƶrigkeit : Vorhersage des Erfolgs bei Cochlea-Implantat-Versorgung

    Get PDF
    Sogenannte Cochlea-Implantate sind unter Menschen mit Schwerhƶrigkeit noch nicht sehr weit verbreitet, unter anderem, weil sich das AusmaƟ des Sprachverstehens mit dem Implantat vor der Operation schwer einschƤtzen lƤsst. Wissenschaftlerinnen und Wissenschaftler der Medizinischen Hochschule Hannover (MHH), der Technischen UniversitƤt Braunschweig und des Forschungszentrums L3S wollen in einem von der VW-Stiftung gefƶrderten Projekt Patientendaten auswerten, um den Erfolg von Cochlea-Implantaten besser bestimmen zu kƶnnen

    Approaches Towards Unified Models for Integrating Web Knowledge Bases

    No full text
    Ma theĢ€se a comme but lā€™inteĢgration automatique de nouveaux services Web dans une base de connaissances. Pour chaque meĢthode dā€™un service Web, une vue est calculeĢe de maniĆØre automatique. La vue est repreĢsenteĢe comme une requeĢ‚te sur la base de connaissances. Lā€™algorithme que nous avons proposĆ© calcule eĢgalement une fonction de transformation XSLT associeĢe Ć  la meĢthode qui est capable de transformer les reĢsultats dā€™appel dans un fragment conforme au scheĢma de la base de connaissances. La nouveauteĢ de notre approche cā€™est que lā€™alignement repose seulement sur lā€™alignement des instances. Il ne deĢpend pas des noms des concepts ni des contraintes qui sont deĢfinis par le scheĢma. Ceci le fait particulieĢ€rement pertinent pour les services Web qui sont publieĢs actuellement sur le Web, parce que ces services utilisent le protocole REST. Ce protocole ne permet pas la publication de scheĢmas. En plus, JSON semble sā€™imposer comme le standard pour la repreĢsentation des reĢsultats dā€™appels de services. ƀ diffeĢrence du langage XML, JSON nā€™utilise pas de noeuds nommeĢs. Donc les algorithmes dā€™alignement traditionnels sont priveĢs de noms de concepts sur lesquels ils se basent.My thesis aim the automatic integration of new Web services in a knowledge base. For each method of a Web service, a view is automatically calculated. The view is represented as a query on the knowledge base. Our algorithm also calculates an XSLT transformation function associated to the method that is able to transform the call results in a fragment according to the schema of the knowledge base. The novelty of our approach is that the alignment is based only on the instances. It does not depend on the names of the concepts or constraints that are defined by the schema. This makes it particularly relevant for Web services that are currently available on the Web, because these services use the REST protocol. This protocol does not allow the publication schemes. In addition, JSON seems to establish itself as the standard for the representation of technology call results
    corecore